Workflow Landing Requests #18807

jmchilton · 2024-09-12T15:36:24Z

I've added a little CLI tool for generating workflow landing requests URLs. External clients won't need to use this but it is a simple example with an easy to follow format and flow that describes how to use the APIs and generate custom forms with pre-populated data for users.

$ . .venv/bin/activate; PYTHONPATH=lib python lib/galaxy/tool_util/client/landing.py -g http://localhost:8081 -s mycoolthing simple_workflow
Your customized form is located at http://localhost:8081/workflow_landings/0d3862e7-345f-4805-a080-5a2cbc2fc547?secret=mycoolthing

This pairs well with the new {src: "url",...} data input dictionaries and a potential minimal workflow running UI (pictured below).

The UI piece is clearly an MVP - hopefully a tool form UI expert can take it over.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

jmchilton · 2024-09-17T15:25:37Z

This is interwound with the tool request API that I thought was closer than it was. I'm finding so many corner cases in modeling tools. If this becomes a blocker for anyone I think I could pretty easily pull out the database migration, the workflow landing stuff, the workflow API that adds {src: "url", ... } inputs, and the form input for the URI parameters.

Alternatively I could pull just the tool API & backend enhancements out and push them back into #17393 and we could keep all the modeling enhancements in here and treat this a bit more like the CWL branch where we have models we sync up but keep the plumbing out until it is solid.

Otherwise I will just spend this week trying to work on the models and just spin some smaller stuff out here and there where it makes sense.

lib/galaxy/tool_util/parameters/case.py

mvdbeek

This is some seriously cool stuff, thank you @jmchilton!

mvdbeek · 2024-09-27T14:20:34Z

lib/galaxy/config/schemas/config_schema.yml

+        required: false
+        desc: |
+          Workflows launched with URI/URL inputs that are not marked as 'deferred'
+          are "materialized" (or undeferred) by the workflow scheduler. This might be


Is it hard to do this in Celery ? Feels like a feature we can require celery for.

I think all of that pipeline could be celery-ified - given that this should work as is though that feels like a second iteration. Our workflow handlers "just work" now - they could be ... more decomposed for sure but it is a bigger project than we need for this feature I think. Workflow scheduling handlers never really materialized the way I wanted - it is maybe the most pointless I ever attempted to make something pluggable. I would be up for just scrapping it all for Celery. But again... future project I think.

lib/galaxy/managers/landing.py

mvdbeek · 2024-09-27T14:28:50Z

lib/galaxy/model/dereference.py

+    if len(transform) > 0:
+        dataset_source.transform = transform
+
+    sa_session.add(hda)


This might be a good spot to check if we already have a dataset with same source, hash, transform and owner ? I think workflows with repeated inputs are going to be a thing (think cross product). This is just a comment to myself basically, not required for a first iteration.

lib/galaxy/workflow/scheduling_manager.py

mvdbeek · 2024-09-30T14:39:46Z

lib/galaxy_test/api/test_workflows.py::TestWorkflowsApi::test_run_workflow_with_invalid_url_hashes INFO:     127.0.0.1:58086 - "GET /api/users HTTP/1.1" 200 OK
INFO:     127.0.0.1:58087 - "POST /api/users/adb5f5c93f827949/api_key HTTP/1.1" 200 OK
INFO:     127.0.0.1:58088 - "GET /api/tools?in_panel=False HTTP/1.1" 200 OK
multipart.multipart DEBUG 2024-09-30 16:35:52,199 [pN:main,p:62707,tN:Thread-9 (run_in_loop)] Calling on_field_start with no data
multipart.multipart DEBUG 2024-09-30 16:35:52,199 [pN:main,p:62707,tN:Thread-9 (run_in_loop)] Calling on_field_name with data[0:4]
multipart.multipart DEBUG 2024-09-30 16:35:52,199 [pN:main,p:62707,tN:Thread-9 (run_in_loop)] Calling on_field_data with data[5:17]
multipart.multipart DEBUG 2024-09-30 16:35:52,199 [pN:main,p:62707,tN:Thread-9 (run_in_loop)] Calling on_field_end with no data
multipart.multipart DEBUG 2024-09-30 16:35:52,199 [pN:main,p:62707,tN:Thread-9 (run_in_loop)] Calling on_end with no data
INFO:     127.0.0.1:58089 - "POST /api/histories HTTP/1.1" 200 OK
INFO:     127.0.0.1:58090 - "POST /api/workflows/upload HTTP/1.1" 200 OK
galaxy.workflow.run_request INFO 2024-09-30 16:35:52,319 [pN:main,p:62707,tN:AnyIO worker thread] Creating a step_state for step.id 471
galaxy.workflow.run_request INFO 2024-09-30 16:35:52,319 [pN:main,p:62707,tN:AnyIO worker thread] Creating a step_state for step.id 472
galaxy.workflow.run_request INFO 2024-09-30 16:35:52,319 [pN:main,p:62707,tN:AnyIO worker thread] Creating a step_state for step.id 473
galaxy.web_stack.handlers INFO 2024-09-30 16:35:52,320 [pN:main,p:62707,tN:AnyIO worker thread] (WorkflowInvocation[unflushed]) Handler '_default_' assigned using 'HANDLER_ASSIGNMENT_METHODS.DB_SKIP_LOCKED' assignment method
INFO:     127.0.0.1:58091 - "POST /api/workflows/f4df8294d9246e23/invocations HTTP/1.1" 200 OK
INFO:     127.0.0.1:58092 - "GET /api/invocations/f356c15ec7800da0 HTTP/1.1" 200 OK
galaxy.jobs.handler DEBUG 2024-09-30 16:35:52,520 [pN:main,p:62707,tN:WorkflowRequestMonitor.monitor_thread] Grabbed WorkflowInvocation(s): 11
galaxy.workflow.scheduling_manager DEBUG 2024-09-30 16:35:52,523 [pN:main,p:62707,tN:WorkflowRequestMonitor.monitor_thread] Attempting to schedule workflow invocation [11]
galaxy.workflow.scheduling_manager INFO 2024-09-30 16:35:52,553 [pN:main,p:62707,tN:WorkflowRequestMonitor.monitor_thread] Failed to materialize dataset for workflow 11 - HistoryDatasetAssociation <galaxy.model.HistoryDatasetAssociation(34) at 0x32f293820> in state error with null file size, this is not valid
galaxy.workflow.scheduling_manager ERROR 2024-09-30 16:35:52,554 [pN:main,p:62707,tN:WorkflowRequestMonitor.monitor_thread] An exception occured scheduling while scheduling workflows
Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/workflow/scheduling_manager.py", line 348, in __attempt_materialize
    self.app.hda_manager.materialize(task_request, in_place=True)
  File "/Users/mvandenb/src/galaxy/lib/galaxy/managers/hdas.py", line 200, in materialize
    session.commit()
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/scoping.py", line 597, in commit
    return self._proxied.commit()
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2028, in commit
    trans.commit(_to_root=True)
  File "<string>", line 2, in commit
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1313, in commit
    self._prepare_impl()
  File "<string>", line 2, in _prepare_impl
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1288, in _prepare_impl
    self.session.flush()
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4352, in flush
    self._flush(objects)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4380, in _flush
    self.dispatch.before_flush(self, flush_context, objects)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/event/attr.py", line 378, in __call__
    fn(*args, **kw)
  File "/Users/mvandenb/src/galaxy/lib/galaxy/model/base.py", line 184, in before_flush
    for obj in versioned_objects_strict(session.dirty):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/model/base.py", line 171, in versioned_objects_strict
    obj.__strict_check_before_flush__()
  File "/Users/mvandenb/src/galaxy/lib/galaxy/model/__init__.py", line 5272, in __strict_check_before_flush__
    raise Exception(
Exception: HistoryDatasetAssociation <galaxy.model.HistoryDatasetAssociation(34) at 0x32f293820> in state error with null file size, this is not valid

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/workflow/scheduling_manager.py", line 322, in __monitor
    self.__schedule(workflow_scheduler_id, workflow_scheduler)
  File "/Users/mvandenb/src/galaxy/lib/galaxy/workflow/scheduling_manager.py", line 332, in __schedule
    self.__attempt_schedule(invocation_id, workflow_scheduler)
  File "/Users/mvandenb/src/galaxy/lib/galaxy/workflow/scheduling_manager.py", line 365, in __attempt_schedule
    if not self.__attempt_materialize(workflow_invocation, session):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/workflow/scheduling_manager.py", line 358, in __attempt_materialize
    session.commit()
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2028, in commit
    trans.commit(_to_root=True)
  File "<string>", line 2, in commit
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1313, in commit
    self._prepare_impl()
  File "<string>", line 2, in _prepare_impl
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1288, in _prepare_impl
    self.session.flush()
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4352, in flush
    self._flush(objects)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4380, in _flush
    self.dispatch.before_flush(self, flush_context, objects)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.10/site-packages/sqlalchemy/event/attr.py", line 378, in __call__
    fn(*args, **kw)
  File "/Users/mvandenb/src/galaxy/lib/galaxy/model/base.py", line 184, in before_flush
    for obj in versioned_objects_strict(session.dirty):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/model/base.py", line 171, in versioned_objects_strict
    obj.__strict_check_before_flush__()
  File "/Users/mvandenb/src/galaxy/lib/galaxy/model/__init__.py", line 5272, in __strict_check_before_flush__
    raise Exception(
Exception: HistoryDatasetAssociation <galaxy.model.HistoryDatasetAssociation(34) at 0x32f293820> in state error with null file size, this is not valid

And this is being repeated across all tests, so I think we keep trying to schedule if we don't also fail the invocation.
The other thing is that we need to set the file size before we commit with a terminal state.

lib/galaxy/workflow/scheduling_manager.py

jmchilton · 2024-09-30T16:55:29Z

Gotcha. I will redo this to fail the invocation on materialization error.

Includes a lot of plumbing for tools but not the APIs - they are not ready to go yet.

jmchilton · 2024-09-30T17:40:11Z

@mvdbeek The workflow failure stuff is so cool - let me know if I used it right. I guess we might want a more specific "reason" code.

mvdbeek · 2024-10-01T16:31:37Z

lib/galaxy/workflow/scheduling_manager.py

+                if not self.__attempt_materialize(workflow_invocation, session):
+                    return None
+                if self.app.config.workflow_scheduling_separate_materialization_iteration:
+                    return None


Does this have any measurable effect ? I would imagine materialization of even a small dataset takes so much longer than scheduling a workflow. As I understand this, we now spend a lot of time materializing a number of inputs (unless deferred), and then based on this setting we proceed (or not) to schedule the workflow.

mvdbeek · 2024-10-01T16:38:14Z

We can always get more specific later on, that part is fine. I would totally like to merge this now, but I worry that we might make scheduling performance very unpredictable if we undefer in the scheduling loop. I don't think it's a ton of work to move that into celery, I'll give that a crack .. worst case we only allow deferred inputs in the landing API ?

jmchilton · 2024-10-01T18:44:48Z

“Premature optimization is the root of all evil” -Tony Hoare. We should let it kill a handler before we worry - we have no clue if the feature will ever be used or if it would be a problem for the use cases we have in mind. Deferred data, download caches, no one using the feature... all might negate the value in that effort.

mvdbeek · 2024-10-01T19:44:20Z

I see this more from the angle, can one unprivileged user break scheduling for everyone else? My experience with these niche features is that they will eventually be used, and at that point admins will not know why their schedulers are stuck.

jmchilton · 2024-10-01T22:42:54Z

I've written many niche features for Galaxy no one has ever used 😅. First sign of trouble and I will offer to redo it in Celery.

jmchilton added kind/enhancement area/API area/workflows labels Sep 12, 2024

jmchilton force-pushed the landing branch 12 times, most recently from 78f6520 to d723581 Compare September 16, 2024 23:52

jmchilton force-pushed the landing branch 12 times, most recently from 64d2efa to 27359a0 Compare September 20, 2024 14:13

github-advanced-security bot found potential problems Sep 20, 2024

View reviewed changes

lib/galaxy/tool_util/parameters/case.py Fixed Show fixed Hide fixed

jmchilton force-pushed the landing branch from 27359a0 to 7d51f7c Compare September 20, 2024 15:01

mvdbeek approved these changes Sep 27, 2024

View reviewed changes

jmchilton force-pushed the landing branch from ac6ea5e to 6494c62 Compare September 30, 2024 13:15

mvdbeek reviewed Sep 30, 2024

View reviewed changes

lib/galaxy/workflow/scheduling_manager.py Outdated Show resolved Hide resolved

lib/galaxy/workflow/scheduling_manager.py Outdated Show resolved Hide resolved

jmchilton added 14 commits September 30, 2024 13:25

Fixes and tests for data fetch models.

dc4b59d

More tests for various data column feature combos.

b367c8a

Is it a bug that column params need to be strings?

c0921f6

Database migration for tool request API.

16c5a4f

Tool Request models.

ad73e9e

Tool Request API - DB Layer.

3cf8db8

Plumbing for dereferencing URLs into datasets.

80aaff8

Allow workflows to consumer deferred {src: url} dicts..

2fe88a0

Implement workflow landings.

660f78f

Includes a lot of plumbing for tools but not the APIs - they are not ready to go yet.

Schema updates for landing.

8eb493a

Allow dataset and source checksum validation during materialization.

93f2af6

Update schema for validating hashes arguments.

edb1cf8

PR comments.

3bf8a80

Revise workflow failure around materializing URLs.

f80207d

jmchilton force-pushed the landing branch from 78f8f2a to f80207d Compare September 30, 2024 17:38

mvdbeek reviewed Oct 1, 2024

View reviewed changes

mvdbeek merged commit 2ead0c4 into galaxyproject:dev Oct 1, 2024
56 checks passed

jdavcs added the highlight Included in user-facing release notes at the top label Nov 20, 2024

jdavcs added highlight/dev Included in admin/dev release notes and removed highlight Included in user-facing release notes at the top labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Landing Requests #18807

Workflow Landing Requests #18807

jmchilton commented Sep 12, 2024 •

edited

Loading

jmchilton commented Sep 17, 2024

mvdbeek left a comment

mvdbeek Sep 27, 2024

jmchilton Sep 27, 2024

mvdbeek Sep 27, 2024

mvdbeek commented Sep 30, 2024

jmchilton commented Sep 30, 2024

jmchilton commented Sep 30, 2024

mvdbeek Oct 1, 2024 •

edited

Loading

mvdbeek commented Oct 1, 2024

jmchilton commented Oct 1, 2024

mvdbeek commented Oct 1, 2024

jmchilton commented Oct 1, 2024

Workflow Landing Requests #18807

Workflow Landing Requests #18807

Conversation

jmchilton commented Sep 12, 2024 • edited Loading

How to test the changes?

License

jmchilton commented Sep 17, 2024

mvdbeek left a comment

Choose a reason for hiding this comment

mvdbeek Sep 27, 2024

Choose a reason for hiding this comment

jmchilton Sep 27, 2024

Choose a reason for hiding this comment

mvdbeek Sep 27, 2024

Choose a reason for hiding this comment

mvdbeek commented Sep 30, 2024

jmchilton commented Sep 30, 2024

jmchilton commented Sep 30, 2024

mvdbeek Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

mvdbeek commented Oct 1, 2024

jmchilton commented Oct 1, 2024

mvdbeek commented Oct 1, 2024

jmchilton commented Oct 1, 2024

jmchilton commented Sep 12, 2024 •

edited

Loading

mvdbeek Oct 1, 2024 •

edited

Loading